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Abstract 

This paper evaluates the performance of boosted decision trees for 
tagging b-jets. It is shown, using a Monte Carlo simulation of WH 
Ivqq events that boosted decision trees outperform feed-forward neural 
networks. The results show that for a b-tagging efficiency of 60% the 
light jet rejection given by boosted decision trees is about 35% higher 
than that given by neural networks. 

1 Introduction 

Precision measurements in the top quark sector, and searches for the Higgs 
boson and physics beyond the Standard Model, critically depend on the good 
identification ("tagging") of jets produced by b quarks. Tagging techniques 
exploit specific properties of B-hadrons to differentiate them from the large 
background of jets produced by light quarks and gluons. The long lifetime 
of B-hadrons results in displaced vertices formed by tracks from their decays. 
Physical observables associated to these vertices constitute the input for sec- 
ondary vertex tagging. Also, tracks from B- and D-hadron decays typically 
have large impact parameters, which are frequently used to construct discrim- 
inating variables. In a different approach, soft-lepton tagging searches for low 
transverse momentum leptons inside jets, originating from semileptonic decays 
of B- and D-hadrons. The tagging performance is substantially improved when 
individual taggers are combined to give a single jet classifier. In high energy 
physics, the feed-forward neural network is one of the most popular methods 
of combining several discriminating variables into one classifier and have been 
extensively applied to b-tagging. 
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In this paper, the capabihty of an ahernative classification technique, the 
boosted decision trees, for tagging b-jets is evaluated. Using a sample of 
WH — ^ luqq Monte Carlo events, the performance of boosted decision trees 
and feed-forward neural networks is compared. Boosted decision trees is a 
learning technique recently introduced in high energy physics for data analysis 
in the MiniBooNE experiment [T]. It was found that particle identification 
with boosted decision trees has better performance than that with neural net- 
works in a Monte Carlo simulation of MiniBooNE data. This insight motivated 
the studies reported here, which indicate that boosted decision trees is also a 
promising technique for tagging b-jets. 

In the next section, a brief description of the boosted decision trees algo- 
rithm is given. The Monte Carlo simulation used in this analysis is explained 
in Section [SI Section H] describes the discriminant variables which feed the 
tagging algorithms. The tagging performances of boosted decision trees and 
neural networks are compared in Section [5l Finally, conclusions are given in 
Section [61 



2 Boosted decision trees 

The boosted decision trees algorithm implemented in this analysis starts with a 
parent node containing a training set of b-jet and u-jet patterns. All jets in the 
first tree iteration are given the same weight w^^\ such that the sum of weights 
equals 1. Then, the algorithm loops over all binary splits in order to find the 
discriminating variable and corresponding separation value that optimizes a 
given figure of merit. For instance, in Figure [1] the optimal figure of merit is 
obtained when the jets are divided between those that have a secondary vertex 
mass greater than 1 GeV/c^ and those that do not. This procedure is then 
repeated for the new daughter nodes until a stopping criterion is satisfied. 

A node is called "signal node" if the sum of the weights of b-jets is greater 
than the sum of the weights of u-jets. Otherwise, it is called "background 
node". A b-jet (u-jet) is correctly classified if it lands on a signal (background) 
node. If p designates the fraction of correctly classified jets in a node, its Gini 
index is defined to be Q{p) = — 2p(l —p). The optimal discriminating variable 
and separation value are the ones which maximize the figure of merit 

_ wlQ{pl) + wrQ{pr) , 

'-^ split — ; , 

Wl + Wr 

where wl and wr are the sum of the jet weights in the left and right daughter 
nodes, respectively, and Q{j>l) and Q{pr) are the Gini indices of the left and 
right daughter nodes. A node is not split if the optimal Qsput is smaller than its 
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Figure 1: Example of a decision tree. 



own Q{p), or, alternatively, if it contains less events than a prespecified limit. 
Unsplit nodes are called "leafs" , which are depicted as rectangles in Figure [H 
After the kth tree is built, the jet weights are updated. There are several 
methods to accomplish this. Here, we will consider the AdaBoost algorithm [2|. 
First, the total misclassification error Sk of the tree is calculated: 

(k) 

where i loops over all jets in the training sample and is an indicator 
function which is equal to 1 if the ith jet was misclassified or equal to if 
the ith jet was correctly classified. Then, the weights of misclassified jets are 
increased (boosted) 

(k) 

-r"=^. (3) 

while the weights of correctly classified jets are decreased 

(fc) 

w^'^'^ = . (4) 

2(1 
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Finally, the tree + 1 is constructed using the new weights w^''^^\ 

After M trees are trained their performance can be evaluated with a testing 
sample of jets. The final score of jet i is a weighted sum of the scores over the 
individual trees 

M 

k=l 

where 

/3. = log {^-^) , (6) 

and /j*'^'' = 1(— 1) if the kth tree makes the jet land on a signal (background) 
leaf. Therefore, b-jets will have large positive scores, while u-jets will have 
large negative scores. Trees with lower misclassification errors Ek are given 
more weight when the jet score is calculated. Further details of the AdaBoost 
algorithm can be found in [3]. 



3 Monte Carlo simulation 

The studies described in this paper were done with events generated with 
PYTHIA 6.319 [1]. We considered the environment of the LHC collider, in 
which pp interactions with a center-of-mass energy of 14 TeV are produced. 
One of the benchmark channels for b tagging studies at the LHC is the associ- 
ated WH production. We generated WH events with mn = 120 GeV/c^, the 
W boson decaying semileptonically W ^ lu and the Higgs boson decaying to 
quark pairs H qq. Initial and final state radiation and multiple interactions 
were included in the simulation. 

Tracks are parametrized by the following set of 5 parameters: do, zq, (f), 
cot 6 and l/pr- The transverse impact parameter do is the distance of closest 
approach of the track to the primary vertex in the plane perpendicular to the 
beam-line. The longitudinal impact parameter zq is the component along the 
beam-line of the distance of closest approach. The parameters (p and 6 are the 
azimuthal and polar angles of the track, respectively, and l/pr is the inverse 
of the particle transverse momentum. 

In order to simulate measurement errors, these parameters were smeared 
with Gaussian resolution functions. The transverse and longitudinal impact 
parameters were smeared with standard deviations = 10 /im and a^o = 
100 yum, the angle with = 0.10 mrad, cot^^ with (Xcote = 0.001 and the 
inverse of the transverse momentum with ai/p^ = 0.001 GeV~^. The primary 
vertex positions were smeared with Gaussian resolution functions with = 
ay = 50 iim and = 100 /xm. A jet is formed by all stable particles inside a 
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cone AR = ^{A(j)y + (Ar/)2 < 0.4 around its axis, where r] = — log (tan(^/2)) 
is the track pseudorapidity. 



4 Discriminant variables 

The physical observables used for discrimination between b-jets and light jets 
are taken from well known "spatial" b-tagging algorithms. Physical observ- 
ables from tagging techniques based on soft leptons are not considered in this 
analysis. Only jets with pt > 15 GeV/c and \ri\ < 2.5 are considered taggable. 

4.1 Impact parameter tag 

Due to the long decay distances traveled by B-hadrons, tracks from b-jets 
have on average larger impact parameters than tracks from light jets, since 
sizeable impact parameters in light jets are exclusively due to measurement 
errors. Therefore, the impact parameter of jet tracks can be used to build a 
useful variable for discrimination between b-jets and light jets. Figure [2] shows 
the distributions of (a) signed transverse impact parameter significances 3^^ = 
do/<7do and (b) signed longitudinal impact parameter significances S^^ = zo/a^Q 
of tracks in b-jets (solid line) and u-jets (dashed line). A positive (negative) 
sign is assigned to the impact parameter if the track intersects the jet axis 
in front (behind) of the primary vertex. These distributions give likelihood 
functions b{S) and u{S) for a track to belong to a b-jet or a u-jet, respectively. 
A jet weight is defined as the sum of the log-likelihood ratio over all tracks in 
the jet: 



In Figure [3] it is shown the distribution of jet weights for u and b quarks. 
Because the transverse impact parameter has better resolution, it yields greater 
discrimination power. A given efficiency for selecting b-jets is obtained by 
selecting jets with weights above some threshold level. Obviously, for moderate 
or high selection efficiencies there will always be some contamination with light 
jets. 

4.2 Secondary vertex tag 

An alternative approach for building b tagging discriminating variables consists 
in reconstructing displaced secondary vertex from B- and D-hadron decays in- 
side the jet. Secondary vertices were reconstructed with Billoir and Qian's fast 
vertex fitting algorithm [5]. For purposes of secondary vertex b-tagging the 
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Figure 2: (a) transverse and (b) longitudinal impact parameter significances 
for tracks in b-jets (solid line) and u-jets (dashed line). 

exact topology of the secondary vertex is irrelevant and, therefore, an inclusive 
vertex search is performed. All jet tracks with large transverse impact param- 
eter significance participate in the vertex fit and vertices compatible with 
decays are rejected. Figure IH^a) shows the decay distance significance for b-jets 
and u-jets for good quality vertices. Besides the decay distance significance, 
other variables associated to the secondary vertex may have discrimination 
power, such as the vertex mass (Figure 111(b)) and the ratio between the abso- 
lute momentum sum of tracks in the secondary vertex and that of all tracks 
in the jet (Figure Hl^c)). 

4.3 One-prong tag 

For one-prong decays of B- and D-hadrons the secondary vertex fit fails. In this 
situation, though, some information can still be extracted from tracks in the 
jet. For instance, the maximal transverse and longitudinal impact parameters 
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Figure 3: Jet weight distributions given by the transverse impact parameter (a) 
and longitudinal impact parameter (b). The solid (dashed) line corresponds 
to b-jets (u-jets). 

of jet tracks clearly have discrimination power, as can be observed in Figure [5l 

5 Results 

Boosted decision trees were implemented using the StatPatternRecognition 
package [3] . The trees were fed with the 7 discriminant variables mentioned in 
the previous section and were trained with 50000 b-jet patterns and 50000 u- 
jet patterns. An unbiased evaluation of the boosted decision trees performance 
is obtained using a distinct sample of b-jets and u-jets patterns (test sample). 
The best results were obtained with a minimum number of jets per leaf of about 
7000. The performance becomes better with increasing number of trees, but no 
significant improvement was observed after several hundreds of tree iterations. 
Figure El^a) shows the jet scores, normalized to be within the interval [0, 1], for 
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Figure 4: (a) Decay distance significance of the secondary vertex, (b) Invariant 
mass of tracks associated to the secondary vertex, (c) Fraction of jet momen- 
tum in the secondary vertex. The sohd (dashed) fine corresponds to b-jets 
(u-jets). 
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Figure 5: Maximal (a) transverse and (b) longitudinal impact parameter sig- 
nificances in jets. The solid (dashed) line corresponds to b-jets (u-jets). 

the test sample of b-jets (sohd line) and u-jets (dashed line). 

In order to compare the performance of boosted decision trees with the neu- 
ral network approach, a feed-forward neural network was implemented using 
the Multi-Layer Perceptron class |6j provided by the data analysis framework 
ROOT [7]. The architecture of the network consisted of 7 nodes in the in- 
put layer (corresponding to the 7 discriminant variables mentioned above), 8 
nodes in a single hidden layer and 1 node in the output layer. The network was 
trained with the Broyden-Fletcher-Goldfarb-Shanno learning method with a 
learning rate parameter t] = 0.1. The training set consisted of 100000 jet pat- 
terns, of which 50000 were b-jets and 50000 were u-jets. Since the magnitude 
of the discriminant variables differ considerably, which may affect the perfor- 
mance of the neural network, all input variables were normalized. The number 
of epochs (training cycles) was 1000. Care was taken to prevent overtraining 
the network by monitoring the evolution of the learning curve. Figure [6](b) 
shows the jet scores given by the neural network for a test sample of b-jet 
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Figure 6: Jet scores given by (a) boosted decision trees and (b) a neural 
network, for b-jets (solid line) and u-jets (dashed line). 

(solid line) and u-jet (dashed line) patterns. 

Jets with a score above some specified threshold value are tagged as b-jets. 
The threshold value is contingent on the desired efficiency for tagging b-jets 
Eb = nI"'^ /Nh, where A*";, is the number of b-jets in the data and A*"^"^ is the num- 
ber of tagged b-jets, or, alternatively, on the tolerated level of contamination 
by light jets. Figure [7] shows the light jet rejection, = e^^ as a function of 
the b-tagging efficiency Eb, for the test sample given by boosted decision trees 
(black circles) and the feed- forward neural network (gray squares). For high 
b-tagging efficiencies there is no significant improvement of the performance 
of boosted decision trees relative to the neural network. However, for mod- 
erate b-tagging efficiencies, boosted decision trees clearly outperform neural 
networks. For a b-tagging efficiency of 60%, the light jet rejection given by 
boosted decision trees is about 35% higher than that given by neural networks. 
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Figure 7: Light jet rejection as a function of b-jet efficiency given by boosted 
decision trees (black circles) and a feed- forward neural network (gray squares) . 

6 Conclusions 

The studies presented in this paper indicate that boosted decision trees out- 
perform neural networks for tagging b-jets, using a Monte Carlo simulation 
of WH —> Ivqq events, and sensible physical observables as discriminating 
variables. For a b-tagging efficiency of 60%, the light jet rejection given by 
boosted decision trees is about 35% higher than that given by the neural net- 
work approach. Although encouraging, these results should be complemented 
with studies performed with a full simulation in which detector inefficiencies 
arc considered. Also, the performance of both techniques may differ if other 
physics channels are considered, since it may be affected by jet overlaps and 
gluon splitting into bb pairs. 
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